59 research outputs found
Light Field Denoising via Anisotropic Parallax Analysis in a CNN Framework
Light field (LF) cameras provide perspective information of scenes by taking
directional measurements of the focusing light rays. The raw outputs are
usually dark with additive camera noise, which impedes subsequent processing
and applications. We propose a novel LF denoising framework based on
anisotropic parallax analysis (APA). Two convolutional neural networks are
jointly designed for the task: first, the structural parallax synthesis network
predicts the parallax details for the entire LF based on a set of anisotropic
parallax features. These novel features can efficiently capture the high
frequency perspective components of a LF from noisy observations. Second, the
view-dependent detail compensation network restores non-Lambertian variation to
each LF view by involving view-specific spatial energies. Extensive experiments
show that the proposed APA LF denoiser provides a much better denoising
performance than state-of-the-art methods in terms of visual quality and in
preservation of parallax details
Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions
Depth estimation is a fundamental problem for light field photography
applications. Numerous methods have been proposed in recent years, which either
focus on crafting cost terms for more robust matching, or on analyzing the
geometry of scene structures embedded in the epipolar-plane images. Significant
improvements have been made in terms of overall depth estimation error;
however, current state-of-the-art methods still show limitations in handling
intricate occluding structures and complex scenes with multiple occlusions. To
address these challenging issues, we propose a very effective depth estimation
framework which focuses on regularizing the initial label confidence map and
edge strength weights. Specifically, we first detect partially occluded
boundary regions (POBR) via superpixel based regularization. Series of
shrinkage/reinforcement operations are then applied on the label confidence map
and edge strength weights over the POBR. We show that after weight
manipulations, even a low-complexity weighted least squares model can produce
much better depth estimation than state-of-the-art methods in terms of average
disparity error rate, occlusion boundary precision-recall rate, and the
preservation of intricate visual features
Low-latency compression of mocap data using learned spatial decorrelation transform
Due to the growing needs of human motion capture (mocap) in movie, video
games, sports, etc., it is highly desired to compress mocap data for efficient
storage and transmission. This paper presents two efficient frameworks for
compressing human mocap data with low latency. The first framework processes
the data in a frame-by-frame manner so that it is ideal for mocap data
streaming and time critical applications. The second one is clip-based and
provides a flexible tradeoff between latency and compression performance. Since
mocap data exhibits some unique spatial characteristics, we propose a very
effective transform, namely learned orthogonal transform (LOT), for reducing
the spatial redundancy. The LOT problem is formulated as minimizing square
error regularized by orthogonality and sparsity and solved via alternating
iteration. We also adopt a predictive coding and temporal DCT for temporal
decorrelation in the frame- and clip-based frameworks, respectively.
Experimental results show that the proposed frameworks can produce higher
compression performance at lower computational cost and latency than the
state-of-the-art methods.Comment: 15 pages, 9 figure
Human Motion Capture Data Tailored Transform Coding
Human motion capture (mocap) is a widely used technique for digitalizing
human movements. With growing usage, compressing mocap data has received
increasing attention, since compact data size enables efficient storage and
transmission. Our analysis shows that mocap data have some unique
characteristics that distinguish themselves from images and videos. Therefore,
directly borrowing image or video compression techniques, such as discrete
cosine transform, does not work well. In this paper, we propose a novel
mocap-tailored transform coding algorithm that takes advantage of these
features. Our algorithm segments the input mocap sequences into clips, which
are represented in 2D matrices. Then it computes a set of data-dependent
orthogonal bases to transform the matrices to frequency domain, in which the
transform coefficients have significantly less dependency. Finally, the
compression is obtained by entropy coding of the quantized coefficients and the
bases. Our method has low computational cost and can be easily extended to
compress mocap databases. It also requires neither training nor complicated
parameter setting. Experimental results demonstrate that the proposed scheme
significantly outperforms state-of-the-art algorithms in terms of compression
performance and speed
Weakly-supervised Part-Attention and Mentored Networks for Vehicle Re-Identification
Vehicle re-identification (Re-ID) aims to retrieve images with the same
vehicle ID across different cameras. Current part-level feature learning
methods typically detect vehicle parts via uniform division, outside tools, or
attention modeling. However, such part features often require expensive
additional annotations and cause sub-optimal performance in case of unreliable
part mask predictions. In this paper, we propose a weakly-supervised
Part-Attention Network (PANet) and Part-Mentored Network (PMNet) for Vehicle
Re-ID. Firstly, PANet localizes vehicle parts via part-relevant channel
recalibration and cluster-based mask generation without vehicle part
supervisory information. Secondly, PMNet leverages teacher-student guided
learning to distill vehicle part-specific features from PANet and performs
multi-scale global-part feature extraction. During inference, PMNet can
adaptively extract discriminative part features without part localization by
PANet, preventing unstable part mask predictions. We address this Re-ID issue
as a multi-task problem and adopt Homoscedastic Uncertainty to learn the
optimal weighing of ID losses. Experiments are conducted on two public
benchmarks, showing that our approach outperforms recent methods, which require
no extra annotations by an average increase of 3.0% in CMC@5 on VehicleID and
over 1.4% in mAP on VeRi776. Moreover, our method can extend to the occluded
vehicle Re-ID task and exhibits good generalization ability.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
- …